Add specific circuit failure error codes. Fixes #3717, #3543, #3364, #2888, #2859, #1580#3792
Add specific circuit failure error codes. Fixes #3717, #3543, #3364, #2888, #2859, #1580#3792
Conversation
6f7f18c to
9fb5773
Compare
| func isNetworkTimeout(err error) bool { | ||
| var netErr net.Error | ||
| return errors.As(err, &netErr) | ||
| } |
There was a problem hiding this comment.
Is this function wrong? This is any net error, and it looks like an ordered switch statement is used to help this function, when I think we can just augment this function:
func isNetworkTimeout(err error) bool {
var netErr net.Error
return errors.As(err, &netErr) && netErr.Timeout()
}It looks like this entire implementation is fine, but the ordering is fragile. I think this would help reduce fragility.
| case ctrl_msg.ErrorTypeInvalidLinkDestination: | ||
| failureCause = CircuitFailureRouterErrInvalidLinkDest |
There was a problem hiding this comment.
Is there a reason this one case doesn't call self.serviceCounters.ServiceDialOtherError()?
There was a problem hiding this comment.
Yes. It's not a dial error, it's a route failure caused by a link going away, so it doesn't count as a service dial error. However, I think semantically we're using ServiceDial to be the circuit dial, not the dial on the router, so it probably should update that. fixing.
| t.Run("timeout", func(t *testing.T) { | ||
| err := fmt.Errorf("dial failed: %w", syscall.ETIMEDOUT) | ||
| require.Equal(t, byte(ctrl_msg.ErrorTypeDialTimedOut), classifyDialError(err)) | ||
| }) |
There was a problem hiding this comment.
Missing the *net.Op variant which is what go actually throws in most paths.
*net.OpError wrapping syscall.ETIMEDOUT.
Example: &net.OpError{Err: &os.SyscallError{Err: syscall.ETIMEDOUT}}
andrewpmartinez
left a comment
There was a problem hiding this comment.
Small improvements suggested.
…ixes #3364, fixes #2888, fixes #2859, fixes #1580 - adds ErrorType constants for rejected-by-application, DNS resolution failed, port not allowed, invalid link destination, and resources not available - adds corresponding CircuitFailureCause strings reported in circuit events - extracts classifyDialError() in route handler to map dial errors to specific error codes using typed errors, syscall constants, and string matching - detects DNS errors via *net.DNSError and string fallback for ER/T hosted services where errors are serialized through the SDK message protocol - detects resource exhaustion via EMFILE, ENFILE, ENOBUFS syscall errors - introduces InvalidLinkDestinationError typed error in forwarder package - adds unit tests covering all 16 classification cases - adds integration tests for rejected-by-application (SDK host), DNS resolution failed, connection refused, and port not allowed (ER/T host mode) with circuit event verification - adds CreateEnrollAndStartTunnelerEdgeRouterWithCfgTweaks to test context
0cde94e to
3a31023
Compare
3a31023 to
90f32af
Compare
port not allowed, invalid link destination, and resources not available
error codes using typed errors, syscall constants, and string matching
services where errors are serialized through the SDK message protocol
DNS resolution failed, connection refused, and port not allowed
(ER/T host mode) with circuit event verification